{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TensorBoard for Visualization\n",
    "\n",
    "Let's take up the same task as defined in Recitation 2. We'll be training a Neural Network to classify if a set of points $(x_1, x_2)$ lie inside a circle of radius $1$ or not. For more details on what the task is, please re-visit Recitation 2.\n",
    "\n",
    "To activate TensorBoard on a program, we add this line after building the graph, right before running the train loop.\n",
    "\n",
    "```\n",
    "                                    writer = tf.summary.FileWriter(\"./logs\")\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This line creates a writer object that creates write event files and saves in the \"./logs\" directory. This is the directory that TensorBoard will search for an event file to log. We'll understand the usage of TensorBoard on both TensorFlow and pytorch. Let's start with **TensorFlow**.\n",
    "\n",
    "**PS:** There are better ways to use the **summary** API in TensorFlow. For the sake of using the same method in both TensorFlow and pytorch, we'll stick with this method. Look into [**SummarySaverHook**](https://www.tensorflow.org/api_docs/python/tf/train/SummarySaverHook) or come to office hours.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import torch, os\n",
    "import torch.nn as nn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to Recitation 2, we first sample some polar co-ordinates that are randomly distributed within a circle of radius 2 and centered at origin, ie. $(0,0)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sample_points(n):\n",
    "    \"\"\"\n",
    "    :param n: Total number of data-points\n",
    "    :return: A tuple (X,y) where X -> [n,2] and y -> [n]\n",
    "    \"\"\"    \n",
    "    radius = np.random.uniform(low=0,high=2,size=n).reshape(-1,1) # uniform radius between 0 and 2\n",
    "    angle = np.random.uniform(low=0,high=2*np.pi,size=n).reshape(-1,1) # uniform angle\n",
    "    x1 = radius*np.cos(angle)\n",
    "    x2=radius*np.sin(angle)\n",
    "    y = (radius<1).astype(int).reshape(-1)\n",
    "    x = np.concatenate([x1,x2],axis=1)\n",
    "    return x,y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generating the data\n",
    "\n",
    "X_tr, y_tr = sample_points(10000)\n",
    "X_val,y_val = sample_points(500)\n",
    "\n",
    "print(X_tr.shape, y_tr.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot some scalars. For visualizing the same metric but on different data sets, you can create separate **tf.summary.FileWriter()** objects and place them in the same folder that you would use as the **log directory** for TensorBoard. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the FileWriters for \"training\" and \"validation\" routines\n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train\")\n",
    "\n",
    "\n",
    "def build_graph(n_units=12, n_layers=2, weight_init=tf.glorot_uniform_initializer(),\n",
    "    bias_init=None, activation=tf.nn.relu, learning_rate=1e-3\n",
    "   ):\n",
    "    X = tf.placeholder(dtype=tf.float32, shape=[None,2], name=\"X\")\n",
    "    y = tf.placeholder(dtype=tf.int64, shape=[None], name=\"y\")\n",
    "    gs = tf.train.get_or_create_global_step()\n",
    "    \n",
    "    with tf.variable_scope(\"network\", reuse=tf.AUTO_REUSE):\n",
    "        net = X\n",
    "        for layer in range(n_layers):\n",
    "            net = tf.layers.dense(net, units=n_units, name=\"LAYER-{}\".format(layer+1), activation=activation,\n",
    "                                 kernel_initializer=weight_init, bias_initializer=bias_init\n",
    "                                 )\n",
    "        logits = tf.layers.dense(net, units=2, name=\"LAYER-Last\", activation=None,\n",
    "                                 kernel_initializer=weight_init, bias_initializer=bias_init\n",
    "                                )\n",
    "        \n",
    "        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=y)\n",
    "        acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits,1), y),tf.float32))\n",
    "\n",
    "        opt = tf.train.AdamOptimizer(learning_rate=learning_rate)\n",
    "        train = opt.minimize(loss, global_step=gs)\n",
    "        \n",
    "        # Evaluating the gradients to log in TB \n",
    "        grads = opt.compute_gradients(loss)\n",
    "        for grad in grads: tf.summary.histogram(\"{}-grad\".format(grad[1].name), grad[0])\n",
    "        \n",
    "    \n",
    "    # Add \"loss\" and \"acc\" as scalar summaries\n",
    "    tf.summary.scalar(\"loss\", tf.reduce_mean(loss))\n",
    "    tf.summary.scalar(\"accuracy\", acc)\n",
    "    \n",
    "    # Collect all trainable variables\n",
    "    all_weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n",
    "    for weight in all_weights: tf.summary.histogram(weight.name, weight)\n",
    "        \n",
    "        \n",
    "    # Merge all summaries into a single op\n",
    "    summary = tf.summary.merge_all()\n",
    "    \n",
    "    return {\n",
    "        \"X\": X, \"y\": y, \"train\": train, \"loss\": tf.reduce_mean(loss), \"acc\": acc, \n",
    "        \"gs\": gs, \"summ\": summary\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We created *val_writer* and *train_writer* for collecting validation and train summaries. Note here that the log path we provided here has the same prefix / directory. This can be different as well - doesn't really matter. \n",
    "\n",
    "To add a scalar value in our logs, we use [*tf.summary.scalar()*](https://www.tensorflow.org/api_docs/python/tf/summary/scalar) where we add the node name and provide the tensor to log. We then merge all the summaries in one single operand (to reduce the hassle of running each operand separately) which we *run* using a session. Running these operands alone does not log it in a file. We then use a FileWriter object to add this summary op using the [*add_summary()*](https://www.tensorflow.org/api_docs/python/tf/summary/FileWriter#add_summary) method and *flush* it to write this event on disk.\n",
    "\n",
    "Similar to plotting a scalar, we use [*tf.summary.histogram()*](https://www.tensorflow.org/api_docs/python/tf/summary/histogram) to plot a histogram. We first collect all the trainable variables and sequentially add them as summaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def start_scalar_training(epochs=100):\n",
    "    with tf.Session() as sess:\n",
    "        sess.run(tf.global_variables_initializer())\n",
    "        for epoch in range(epochs):\n",
    "            # *train* operands\n",
    "            gs, _, summary = sess.run([ops[\"gs\"],ops[\"train\"], ops[\"summ\"]], {ops[\"X\"]:X_tr, ops[\"y\"]:y_tr})\n",
    "            train_writer_tf.add_summary(summary, epoch) # Logging the summary in the event file\n",
    "            train_writer_tf.flush() # Write to disk\n",
    "            \n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_graph(weight_init=tf.random_uniform_initializer(minval=-0.01,maxval=0.01))\n",
    "start_scalar_training(epochs=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once you start training, you must go to your terminal and start TensorBoard. You need to provide the directory path where the event files are logged as an argument to **logdir**. TensorBoard automatically starts up on port (default) 6006. [`tensorboard --logdir=./logs_tf`]\n",
    "\n",
    "With different weight initialization techniques, the gradient update changes and could lead to a faster convergence. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Random uniform initialization of biases and using Glorot initialization for Weight matrices.\n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train_init\")\n",
    "\n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2))\n",
    "start_scalar_training(epochs=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using Different Activation functions\n",
    "\n",
    "Let's see and understand how each of these activation functions perform.\n",
    "\n",
    "- Sigmoid\n",
    "    * Get values between 0 and 1.\n",
    "    * A Sigmoid layer easily dies or saturates. A value too small kills the gradient flow whereas a value too big saturates the neurons, effectively passing no information through it.\n",
    "    \n",
    "- Tanh\n",
    "    * Outputs values between -1 and 1. Also zero centered and so does not have the problem of all positive/negative gradients.\n",
    "    * Better than Sigmoid but problem of saturation persists.\n",
    "\n",
    "- ReLU\n",
    "    * Converges quickly as is a threshold based activation and does not saturate.\n",
    "    * Neurons die off. Large weight update could set the weights in such a way (they become negative) during backpropagation that they never fire for any data point. Important to set lower learning rates for ReLU.\n",
    "    * Leaky ReLU asjusts this problem by having a very small negative value for `x < 0`.\n",
    "    \n",
    "    \n",
    "**TRY IT OUT**\n",
    "\n",
    "Use `He Intialization` and `Xavier Initialization` with all the 3 activation functions. See which performs better and try to find out why. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Logging for all Activations\n",
    "# 0.1\n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train_sigmoid\")\n",
    "\n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.sigmoid)\n",
    "start_scalar_training(epochs=200)\n",
    "\n",
    "# 0.2\n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train_tanh\")\n",
    "\n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.tanh)\n",
    "start_scalar_training(epochs=200)\n",
    "\n",
    "\n",
    "# 0.3\n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train_lrelu\")\n",
    "\n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_graph(bias_init=tf.random_uniform_initializer(minval=-0.2,maxval=0.2), activation=tf.nn.leaky_relu)\n",
    "start_scalar_training(epochs=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting Images\n",
    "Along with scalars and histograms, you can also use TensorBoard to visualize *images*. Visualizing images is particularly helpful if you want to understand which training images are causing your loss to deviate from its normal (hopefully decreasing) path. It also helps you evaluate your model's performance in a classification task by plotting confusion matrices and visualizing which classes are difficult for your model to understand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_another_graph(n_units=12, n_layers=2):\n",
    "    ## Same as before ----------> \n",
    "    X = tf.placeholder(dtype=tf.float32, shape=[None,2], name=\"X\")\n",
    "    y = tf.placeholder(dtype=tf.int64, shape=[None], name=\"y\")\n",
    "    gs = tf.train.get_or_create_global_step()\n",
    "    \n",
    "    with tf.variable_scope(\"network\", reuse=tf.AUTO_REUSE):\n",
    "        net = X\n",
    "        for layer in range(n_layers):\n",
    "            net = tf.layers.dense(net, units=n_units, name=\"LAYER-{}\".format(layer+1), activation=tf.nn.relu)\n",
    "        logits = tf.layers.dense(net, units=2, name=\"LAYER-Last\", activation=None)\n",
    "\n",
    "        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=y)\n",
    "        acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits,1), y),tf.float32))\n",
    "\n",
    "        opt = tf.train.AdamOptimizer(learning_rate=1e-3)\n",
    "        train = opt.minimize(loss, global_step=gs)\n",
    "    ## <----------\n",
    "    confusion = tf.confusion_matrix(y, tf.argmax(tf.nn.softmax(logits), axis=1), num_classes=2, name='confusion')\n",
    "    # reshape the matrix as a 4D image\n",
    "    confusion_image = tf.reshape( tf.cast(confusion, tf.float32), [1, 2, 2, 1])\n",
    "    tf.summary.image('confusion', confusion_image)\n",
    "    \n",
    "    # Merge all summaries into a single op\n",
    "    summary = tf.summary.merge_all()\n",
    "    \n",
    "    return {\n",
    "        \"X\": X, \"y\": y, \"train\": train, \"loss\": tf.reduce_mean(loss), \"acc\": acc, \n",
    "        \"gs\": gs, \"summ\": summary\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def start_image_training(epochs=50):\n",
    "    with tf.Session() as sess:\n",
    "        sess.run(tf.global_variables_initializer())\n",
    "        for epoch in range(epochs):\n",
    "            # *train* operands\n",
    "            gs, _, summary = sess.run([ops[\"gs\"],ops[\"train\"], ops[\"summ\"]], {ops[\"X\"]:X_tr, ops[\"y\"]:y_tr})\n",
    "            train_writer_tf.add_summary(summary, epoch) # Logging the summary in the event file\n",
    "            train_writer_tf.flush() # Write to disk\n",
    "            \n",
    "train_writer_tf = tf.summary.FileWriter(\"./logs/train_confusion\")\n",
    "            \n",
    "# Reset graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# Build the graph and start training\n",
    "ops = build_another_graph()\n",
    "start_image_training()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To plot the number of samples in each cell when plotting the confusion matrix, we can either print it on the Python console or use [*tf.summary.text()*](https://www.tensorflow.org/api_docs/python/tf/summary/text)  \n",
    "\n",
    "### Using TensorBoard in Pytorch\n",
    "\n",
    "Plotting in PyTorch is a bit different than TF. This is because PyTorch does not use any TensorFlow operand for calculations. Hence, we will need to represent numpy objects as tensorflow operands to setup logging. Also, we make use of the `tf.Summary` class instead of the `tf.summary` calls.\n",
    "\n",
    "Here's an example of that conversion. This Logger class has been taken from a [Github gist](https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "class Logger(object):\n",
    "    \"\"\"Logging in tensorboard without tensorflow ops.\"\"\"\n",
    "\n",
    "    def __init__(self, log_dir):\n",
    "        self.writer = tf.summary.FileWriter(log_dir)\n",
    "\n",
    "    def log_scalar(self, tag, value, step):\n",
    "        \"\"\"Log a scalar variable.\n",
    "        Parameter\n",
    "        ----------\n",
    "        tag : Name of the scalar\n",
    "        value : value itself\n",
    "        step :  training iteration\n",
    "        \"\"\"\n",
    "        # Notice we're using the Summary \"class\" instead of the \"tf.summary\" public API.\n",
    "        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])\n",
    "        self.writer.add_summary(summary, step)\n",
    "\n",
    "    def log_histogram(self, tag, values, step, bins=1000):\n",
    "        \"\"\"Logs the histogram of a list/vector of values.\"\"\"\n",
    "        # Convert to a numpy array\n",
    "        values = np.array(values)\n",
    "        \n",
    "        # Create histogram using numpy        \n",
    "        counts, bin_edges = np.histogram(values, bins=bins)\n",
    "\n",
    "        # Fill fields of histogram proto\n",
    "        hist = tf.HistogramProto()\n",
    "        hist.min = float(np.min(values))\n",
    "        hist.max = float(np.max(values))\n",
    "        hist.num = int(np.prod(values.shape))\n",
    "        hist.sum = float(np.sum(values))\n",
    "        hist.sum_squares = float(np.sum(values**2))\n",
    "\n",
    "        # Requires equal number as bins, where the first goes from -DBL_MAX to bin_edges[1]\n",
    "        # See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto#L30\n",
    "        # Thus, we drop the start of the first bin\n",
    "        bin_edges = bin_edges[1:]\n",
    "\n",
    "        # Add bin edges and counts\n",
    "        for edge in bin_edges:\n",
    "            hist.bucket_limit.append(edge)\n",
    "        for c in counts:\n",
    "            hist.bucket.append(c)\n",
    "\n",
    "        # Create and write Summary\n",
    "        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])\n",
    "        self.writer.add_summary(summary, step)\n",
    "        self.writer.flush()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the same model code as in Recitation 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_single_hidden_MLP(n_hidden_neurons):\n",
    "    return nn.Sequential(nn.Linear(2,n_hidden_neurons),nn.ReLU(),nn.Linear(n_hidden_neurons,2))\n",
    "\n",
    "trainx = torch.from_numpy(X_tr).float()\n",
    "valx = torch.from_numpy(X_val).float()\n",
    "trainy = torch.from_numpy(y_tr)\n",
    "valy = torch.from_numpy(y_val)\n",
    "\n",
    "tLog, vLog = Logger(\"./logs/train_pytorch\"), Logger(\"./logs/val_pytorch\")\n",
    "\n",
    "model1 = generate_single_hidden_MLP(6)\n",
    "print(trainx.type(),trainy.type())\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def training_routine(net,dataset,n_iters,gpu):\n",
    "    # organize the data\n",
    "    train_data,train_labels,val_data,val_labels = dataset\n",
    "    \n",
    "    criterion = nn.CrossEntropyLoss()\n",
    "    optimizer = torch.optim.SGD(net.parameters(),lr=0.01)\n",
    "    \n",
    "    # use the flag\n",
    "    if gpu:\n",
    "        train_data,train_labels = train_data.cuda(),train_labels.cuda()\n",
    "        val_data,val_labels = val_data.cuda(),val_labels.cuda()\n",
    "        net = net.cuda() # the network parameters also need to be on the gpu !\n",
    "    for i in range(n_iters):\n",
    "        # forward pass\n",
    "        train_output = net(train_data)\n",
    "        train_loss = criterion(train_output,train_labels)\n",
    "        # backward pass and optimization\n",
    "        train_loss.backward()\n",
    "        optimizer.step()\n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        # Once every 100 iterations, log values\n",
    "        if i%100==0:\n",
    "            # compute the accuracy of the prediction\n",
    "            train_prediction = train_output.cpu().detach().argmax(dim=1)\n",
    "            train_accuracy = (train_prediction.numpy()==train_labels.numpy()).mean() \n",
    "            # Now for the validation set\n",
    "            val_output = net(val_data)\n",
    "            val_loss = criterion(val_output,val_labels)\n",
    "            # compute the accuracy of the prediction\n",
    "            val_prediction = val_output.cpu().detach().argmax(dim=1)\n",
    "            val_accuracy = (val_prediction.numpy()==val_labels.numpy()).mean() \n",
    "            \n",
    "            # 1. Log scalar values (scalar summary)\n",
    "            tr_info = { 'loss': train_loss.cpu().detach().numpy(), 'accuracy': train_accuracy }\n",
    "\n",
    "            for tag, value in tr_info.items():\n",
    "                tLog.log_scalar(tag, value, i+1)\n",
    "\n",
    "            # 2. Log values and gradients of the parameters (histogram summary)\n",
    "            for tag, value in net.named_parameters():\n",
    "                tag = tag.replace('.', '/')\n",
    "                tLog.log_histogram(tag, value.data.cpu().numpy(), i+1)\n",
    "                tLog.log_histogram(tag+'/grad', value.grad.data.cpu().numpy(), i+1)            \n",
    "    \n",
    "    net = net.cpu()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = trainx,trainy,valx,valy\n",
    "gpu = False\n",
    "gpu = gpu and torch.cuda.is_available() # to know if you actually can use the GPU\n",
    "\n",
    "training_routine(model1,dataset,1000,gpu)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
